55 research outputs found

    Overclock in Microprocessors

    Get PDF
    Trabajo de InvestigaciónEl presente trabajo hace referencia al tema de Overclock, que se puede definir como la técnica que permite obtener mayor rendimiento de un dispositivo electrónico como un microprocesador. Para ello se aplican técnicas de refrigeración bajo cero (0) enfocadas en eliminar la limitante térmica que presenta el componente y de dicha manera logra mayores frecuencias de operación, mejor rendimiento y resultados en menor tiempo comparados con el mismo dispositivo en operación normal especificada por el fabricante.PregradoIngeniero Electrónic

    High Speed Clock Glitching

    Get PDF
    In recent times, hardware security has drawn a lot of interest in the research community. With physical proximity to the target devices, various fault injection hardware attack methods have been proposed and tested to alter their functionality and trigger behavior not intended by the design. There are various types of faults that can be injected depending on the parameters being used and the level at which the device is tampered with. The literature describes various fault models to inject faults in clock of the target but there are no publications on overclocking circuits for fault injection. The proposed method bridges this gap by conducting high-speed clock fault injection on latest high-speed micro-controller units where the target device is overclocked for a short duration in the range of 4-1000 ns. This thesis proposes a method of generating a high-speed clock and driving the target device using the same clock. The properties of the target devices for performing experiments in this research are: Externally accessible clock input line and GPIO line. The proposed method is to develop a high-speed clock using custom bit-stream sent to FPGA and subsequently using external analog circuitry to generate a clock-glitch which can inject fault on the target micro-controller. Communication coupled with glitching allows us to check the target\u27s response, which can result in information disclosure.This is a form of non-invasive and effective hardware attack. The required background, methodology and experimental setup required to implement high-speed clock glitching has been discussed in this thesis. The impact of different overclock frequencies used in clock fault injection is explored. The preliminary results have been discussed and we show that even high-speed micro-controller units should consider countermeasures against clock fault injection. Influencing the execution of Tiva C Launchpad and STM32F4 micro-controller units has been shown in this thesis. The thesis details the method used for the testing a

    Expert System for PC Overclocking

    Get PDF

    Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems

    Get PDF
    Computers have changed our lives beyond our own imagination in the past several decades. The continued and progressive advancements in VLSI technology and numerous micro-architectural innovations have played a key role in the design of spectacular low-cost high performance computing systems that have become omnipresent in today\u27s technology driven world. Performance and dependability have become key concerns as these ubiquitous computing machines continue to drive our everyday life. Every application has unique demands, as they run in diverse operating environments. Dependable, aggressive and adaptive systems improve efficiency in terms of speed, reliability and energy consumption. Traditional computing systems run at a fixed clock frequency, which is determined by taking into account the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable overclocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. The success of this design methodology relies on the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case design methodology. Better-than-worst-case design methodology is advocated by several recent research pursuits, which exploit dependability techniques to enhance computer system performance. In this dissertation, we address different aspects of timing speculation based adaptive reliable overclocking schemes, and evaluate their role in the design of low-cost, high performance, energy efficient and dependable systems. We visualize various control knobs in the design that can be favorably controlled to ensure different design targets. As part of this research, we extend the SPRIT3E, or Superscalar PeRformance Improvement Through Tolerating Timing Errors, framework, and characterize the extent of application dependent performance acceleration achievable in superscalar processors by scrutinizing the various parameters that impact the operation beyond worst-case limits. We study the limitations imposed by short-path constraints on our technique, and present ways to exploit them to maximize performance gains. We analyze the sensitivity of our technique\u27s adaptiveness by exploring the necessary hardware requirements for dynamic overclocking schemes. Experimental analysis based on SPEC2000 benchmarks running on a SimpleScalar Alpha processor simulator, augmented with error rate data obtained from hardware simulations of a superscalar processor, are presented. Even though reliable overclocking guarantees functional correctness, it leads to higher power consumption. As a consequence, reliable overclocking without considering on-chip temperatures will bring down the lifetime reliability of the chip. In this thesis, we analyze how reliable overclocking impacts the on-chip temperature of a microprocessor and evaluate the effects of overheating, due to such reliable dynamic frequency tuning mechanisms, on the lifetime reliability of these systems. We then evaluate the effect of performing thermal throttling, a technique that clamps the on-chip temperature below a predefined value, on system performance and reliability. Our study shows that a reliably overclocked system with dynamic thermal management achieves 25% performance improvement, while lasting for 14 years when being operated within 353K. Over the past five decades, technology scaling, as predicted by Moore\u27s law, has been the bedrock of semiconductor technology evolution. The continued downscaling of CMOS technology to deep sub-micron gate lengths has been the primary reason for its dominance in today\u27s omnipresent silicon microchips. Even as the transition to the next technology node is indispensable, the initial cost and time associated in doing so presents a non-level playing field for the competitors in the semiconductor business. As part of this thesis, we evaluate the capability of speculative reliable overclocking mechanisms to maximize performance at a given technology level. We evaluate its competitiveness when compared to technology scaling, in terms of performance, power consumption, energy and energy delay product. We present a comprehensive comparison for integer and floating point SPEC2000 benchmarks running on a simulated Alpha processor at three different technology nodes in normal and enhanced modes. Our results suggest that adopting reliable overclocking strategies will help skip a technology node altogether, or be competitive in the market, while porting to the next technology node. Reliability has become a serious concern as systems embrace nanometer technologies. In this dissertation, we propose a novel fault tolerant aggressive system that combines soft error protection and timing error tolerance. We replicate both the pipeline registers and the pipeline stage combinational logic. The replicated logic receives its inputs from the primary pipeline registers while writing its output to the replicated pipeline registers. The organization of redundancy in the proposed Conjoined Pipeline system supports overclocking, provides concurrent error detection and recovery capability for soft errors, intermittent faults and timing errors, and flags permanent silicon defects. The fast recovery process requires no checkpointing and takes three cycles. Back annotated post-layout gate-level timing simulations, using 45nm technology, of a conjoined two-stage arithmetic pipeline and a conjoined five-stage DLX pipeline processor, with forwarding logic, show that our approach, even under a severe fault injection campaign, achieves near 100% fault coverage and an average performance improvement of about 20%, when dynamically overclocked

    Expert System for PC Overclocking

    Get PDF

    Real-Time Task Scheduling under Thermal Constraints

    Get PDF
    As the speed of integrated circuits increases, so does their power consumption. Most of this power is turned into heat, which must be dissipated effectively in order for the circuit to avoid thermal damage. Thermal control therefore has emerged as an important issue in design and management of circuits and systems. Dynamic speed scaling, where the input power is temporarily reduced by appropriately slowing down the circuit, is one of the major techniques to manage power so as to maintain safe temperature levels. In this study, we focus on thermally-constrained hard real-time systems, where timing guarantees must be met without exceeding safe temperature levels within the microprocessor. Speed scaling mechanisms provided in many of today’s processors provide opportunities to temporarily increase the processor speed beyond levels that would be safe over extended time periods. This dissertation addresses the problem of safely controlling the processor speed when scheduling mixed workloads with both hard-real-time periodic tasks and non-real-time, but latency-sensitive, aperiodic jobs. We first introduce the Transient Overclocking Server, which safely reduces the response time of aperiodic jobs in the presence of hard real-time periodic tasks and thermal constraints. We then propose a design-time (off-line) execution-budget allocation scheme for the application of the Transient Overclocking Server. We show that there is an optimal budget allocation which depends on the temporal character istics of the aperiodic workload. In order to provide a quantitative framework for the allocation of budget during system design, we present a queuing model and validate the model with results from a discrete-event simulator. Next, we describe an on-line thermally-aware transient overclocking method to reduce the response time of aperiodic jobs efficiently at run-time. We describe a modified Slack-Stealing algorithm to consider the thermal constraints of systems together with the deadline constraints of periodic tasks. With the thermal model and temperature data provided by embedded thermal sensors, we compute slack for aperiodic workload at run-time that satisfies both thermal and temporal constraints. We show that the proposed Thermally-Aware Slack-Stealing algorithm minimizes the response times of aperiodic jobs while guaranteeing both the thermal safety of the system and the schedulability of the real-time tasks. The two proposed speed control algorithms are examples of so-called proactive schemes, since they rely on a prediction of the thermal trajectory to control the temperature before safe levels are exceeded. In practice, the effectiveness of proactive speed control for the thermal management of a system relies on the accuracy of the thermal model that underlies the prediction of the effects of speed scaling and task execution on the temperature of the processor. Due to variances in the manufacturing of the circuit and of the environment it is to operate, an accurate thermal model can be gathered at deployment time only. The absence of power data makes a straightforward derivation of a model impossible. We, therefore, study and describe a methodology to infer efficiently the thermal model based on the monitoring of system temperatures and number of instructions used for task executions

    G-PUF : asoftware-only PUF for GPUs

    Get PDF
    Physical Unclonable Functions (PUFs) are security primitives which allow the generation of unique IDs and security keys. Their security stems from the inherent process variations of silicon chips manufacturing, and the minute random effects introduced in integrated circuits. PUFs usually are manufactured speciffically for this purpose, but in the last few years several proposals have developed PUFs from off-the-shelf components. These Intrinsic PUFs avoid modifications in the hardware and explore the low cost of adapting existing technologies. Graphical Processing Units (GPUs) present themselves as promising candidates for an Intrinsic PUF. GPUs are massively multi-processed systems originally built for graphical computing and more recently re-designed for general computing. These devices are distributed across a variety of systems and application environments, from computer vision platforms, to server clusters and home computers. Building PUFs with software-only strategies is a challenging problem, since a PUF must evaluate process variations without rendering system performance, characteristics which are easily done in hardware. In this work we present G-PUF, an intrinsic PUF technology running entirely on CUDA. The proposed solution maps the distribution of soft-errors in matrix multiplications when the GPU is running on adversarial conditions of overclock and undervoltage. The resulting error map will be unique to each GPU, and using a novel Challenge-Response Pair extraction algorithm, G-PUF is able to retrieve secure-keys or an device ID without disclosing information about the PUF randomness. The system was tested in real setups and requires no modifications whatsoever to an already operational GPU. G-PUF was capable of achieving upwards of 94.73% of reliability without any error correction code and can provide up to 253 unique Challenge-Response Pairs.Physically Unclonable Functions (PUFs) são primitivas de segurança que permitem a criação de identidades únicas e de chaves seguras. Sua segurança deriva das variações de processo intrínsecas à fabricação de chips de silício, e os diminutos efeitos aleatórios introduzidos em circuitos integrados. PUFs normalmente são fabricados especificamente para esse propósito, mas nos últimos anos várias propostas desenvolveram PUFs com componentes comuns. Esses PUFs Intrínsecos evitam modificações de hardware e exploram o baixo custo de adaptar tecnologias já existentes. Unidades de Processamento Gráfico (GPUs) se apresentam como candidatos promissores para um PUF Intrínseco. GPUs são sistemas massivamente multi-processados, desenvolvidos originalmente para computação gráfica e mais recentemente reprojetadas para computação genérica. Esses dispositivos estão distribuidos através de uma variedade de sistemas e aplicações, desde plataformas de visão computacional até clusters de servidores e computadores pessoais. Construir PUFs com estratégias puramente em software é um processo desafiador, já que um PUF deve avaliar variações de processo sem afetar a performance do sistema, características que são mais facilmente alcançáceis em hardware. Nesse trabalho, apresentamos o G-PUF, uma tecnologia de PUF Intrínseco rodando puramente em CUDA. A solução proposta mapeia a distribuição de soft-errors em multiplicações de matrizes, enquanto a GPU opera em condições adversas como overclock e subalimentação. O mapa de erros resultante será único para cada GPU, e utilizando um novo algorítmo para a extração de pares de desafio-resposta, o G-PUF consegue extrair chaves seguras e a identidade do dispositivo sem revelar informações sobre a sua aleatoriedade. O sistema foi testado em condições reais e não requer nenhuma modificação para um sistema de GPU já em operação. G-PUF foi capaz de alcançar uma reliability de até 94.73% sem utilizar nenhum código de correção de erros e pode prover até 253 pares de desafio-resposta únicos

    The impact of selected determinants on the perceived competitiveness and quality of processor manufacturers : a case study

    Get PDF
    Purpose: The aim of the research was to compare Advanced Micro Devices Inc., and Intel processors with regard to selected market and technological determinants that influenced development of AMD and contributed to the increase of its competitive advantage in the processors market. Design/Methodology/Approach: The research was carried out using the analysis of company’s documentation and annual reports supported by statistical data from third party companies like, German retailer - Mindfactory.de or steam community. Additionally in this paper qualitative research was used among customers of AMD processors in order to evaluate the quality of products and operations. Conducted data analysis and empirical research has shown an increase of sales in volume of AMD products as well as an increase in the market share of processors in years 2016-2020. Findings: The results indicate that advanced, innovative solutions have significant impact on achieving a competitive advantage in the market of desktop processors. Further determinates which play important role in this respect are quality, price and building positive relationships through regular feedback. Research reveals that competition entails the need of higher innovation if consumers have higher preferences for quality and lower price. Practical Implications: Presented indicators may be useful for monitoring the performance of any company operating in technological competition to ensure appropriate conditions for the development and growth of its competitiveness. Originality/Value: The specificity of competition based on the model of dynamic oligopoly in desktop processors industry limits the possibility of using exactly same determinants in other competitive markets although some aspects are common and can be taken into consideration by other companies.peer-reviewe

    Proposition for a Sequential Accelerator in Future General-Purpose Manycore Processors and the Problem of Migration-Induced Cache Misses

    Get PDF
    International audienceAs the number of transistors on a chip doubles with every technology generation, the number of on-chip cores also increases rapidly, making possible in a foreseeable future to design processors featuring hundreds of general-purpose cores. However, though a large number of cores speeds up parallel code sections, Amdahl's law requires speeding up sequential sections too. We argue that it will become possible to dedicate a substantial fraction of the chip area and power budget to achieve high sequential performance. Current general-purpose processors contain a handful of cores designed to be continuously active and run in parallel. This leads to power and thermal constraints that limit the core's performance. We propose removing these constraints with a {\it sequential accelerator} ({\bf SACC}). A SACC consists of several cores {\it designed} for ultimate sequential performance. These cores cannot run continuously. A single core is active at any time, the rest of the cores are inactive and power-gated. We migrate the execution periodically to another core to spread heat generation uniformly over the whole SACC area, thus addressing the temperature issue. The SACC will be viable only if it yields significant sequential performance. Migration-induced cache misses may limit performance gains. We propose some solutions to mitigate this problem. We also investigate a migration method using thermal sensors, such that the migration interval depends on the ambient temperature and the migration penalty is negligible under normal thermal conditions
    • …
    corecore